NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

POPQC: Parallel Optimization for Quantum Circuits

https://doi.org/10.1145/3694906.3743325

Liu, Pengyu; Arora, Jatin; Xu, Mingkuan; Acar, Umut A (July 2025, ACM)

Free, publicly-accessible full text available July 16, 2026
Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs

https://doi.org/10.1109/SC41406.2024.00087

Xu, Mingkuan; Cao, Shiyi; Miao, Xupeng; Acar, Umut A; Jia, Zhihao (November 2024, IEEE)

Full Text Available
GraFeyn: Efficient Parallel Sparse Simulation of Quantum Circuits

https://doi.org/10.1109/QCE60285.2024.00132

Westrick, Sam; Liu, Pengyu; Kang, Byeongjee; McDonald, Colin; Rainey, Mike; Xu, Mingkuan; Arora, Jatin; Ding, Yongshan; Acar, Umut A (September 2024, IEEE)

Full Text Available
Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

https://doi.org/10.1109/ISCA59077.2024.00030

Wang, Hanrui; Liu, Pengyu; Tan, Daniel Bochen; Liu, Yilian; Gu, Jiaqi; Pan, David Z; Cong, Jason; Acar, Umut A; Han, Song (June 2024, IEEE)

Full Text Available
Disentanglement with Futures, State, and Interaction

https://doi.org/10.1145/3632895

Arora, Jatin; Muller, Stefan K; Acar, Umut A (January 2024, Proceedings of the ACM on Programming Languages)

Recent work has proposed a memory property for parallel programs, called disentanglement, and showed that it is pervasive in a variety of programs, written in different languages, ranging from C/C++ to Parallel ML, and showed that it can be exploited to improve the performance of parallel functional programs. All existing work on disentanglement, however, considers the fork/join model for parallelism and does not apply to futures, the more powerful approach to parallelism. This is not surprising: fork/join parallel programs exhibit a reasonably strict dependency structure (e.g., series-parallel DAGs), which disentanglement exploits. In contrast, with futures, parallel computations become first-class values of the language, and thus can be created, and passed between functions calls or stored in memory, just like other ordinary values, resulting in complex dependency structures, especially in the presence of mutable state. For example, parallel programs with futures can have deadlocks, which is impossible with fork-join parallelism. In this paper, we are interested in the theoretical question of whether disentanglement may be extended beyond fork/join parallelism, and specifically to futures. We consider a functional language with futures, Input/Output (I/O), and mutable state (references) and show that a broad range of programs written in this language are disentangled. We start by formalizing disentanglement for futures and proving that purely functional programs written in this language are disentangled. We then generalize this result in three directions. First, we consider state (effects) and prove that stateful programs are disentangled if they are race free. Second, we show that race freedom is sufficient but not a necessary condition and non-deterministic programs, e.g. those that use atomic read-modify-operations and some non-deterministic combinators, may also be disentangled. Third, we prove that disentangled task-parallel programs written with futures are free of deadlocks, which arise due to interactions between state and the rich dependencies that can be expressed with futures. Taken together, these results show that disentanglement generalizes to parallel programs with futures and, thus, the benefits of disentanglement may go well beyond fork-join parallelism.
more » « less
Full Text Available
Compiling Loop-Based Nested Parallelism for Irregular Workloads

https://doi.org/10.1145/3620665.3640405

Su, Yian; Rainey, Mike; Wanninger, Nick; Dhiantravan, Nadharm; Liang, Jasper; Acar, Umut A; Dinda, Peter; Campanoni, Simone (April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however: on modern multicore systems, realizing logical parallelism results in overheads due to the creation and management of parallel tasks, which can wipe out the benefits of parallelism. Today, we expect application programmers to cope with it by manually tuning and optimizing their code. Such tuning requires programmers to reason about architectural factors hidden behind layers of software abstractions, such as task scheduling and load balancing. Managing these factors is particularly challenging when workloads are irregular because their performance is input-sensitive. This paper presents HBC, the first compiler that translates C/C++ programs with high-level, fork-join constructs (e.g., OpenMP) to binaries capable of automatically controlling the cost of parallelism and dealing with irregular, input-sensitive workloads. The basis of our approach is Heartbeat Scheduling, a recent proposal for automatic granularity control, which is backed by formal guarantees on performance. HBC binaries outperform OpenMP binaries for workloads for which even entirely manual solutions struggle to find the right balance between parallelism and its costs.
more » « less
Full Text Available
Efficient Parallel Functional Programming with Effects

https://doi.org/10.1145/3591284

Arora, Jatin; Westrick, Sam; Acar, Umut A. (June 2023, Proceedings of the ACM on Programming Languages)

Although functional programming languages simplify writing safe parallel programs by helping programmers to avoid data races, they have traditionally delivered poor performance. Recent work improved performance by using a hierarchical memory architecture that allows processors to allocate and reclaim memory independently without any synchronization, solving thus the key performance challenge afflicting functional programs. The approach, however, restricts mutation, or memory effects, so as to ensure "disentanglement", a low-level memory property that guarantees independence between different heaps in the hierarchy. This paper proposes techniques for supporting entanglement and for allowing functional programs to use mutation at will. Our techniques manage entanglement by distinguishing between disentangled and entangled objects and shielding disentangled objects from the cost of entanglement management. We present a semantics that formalizes entanglement as a property at the granularity of memory objects, and define several cost metrics to reason about and bound the time and space cost of entanglement. We present an implementation of the techniques by extending the MPL compiler for Parallel ML. The extended compiler supports all features of the Parallel ML language, including unrestricted effects. Our experiments using a variety of benchmarks show that MPL incurs a small time and space overhead compared to sequential runs, scales well, and is competitive with languages such as C++, Go, Java, OCaml. These results show that our techniques can marry the safety benefits of functional programming with performance.
more » « less
Full Text Available
WARDen: Specializing Cache Coherence for High-Level Parallel Languages

https://doi.org/10.1145/3579990.3580013

Wilkins, Michael; Westrick, Sam; Kandiah, Vijay; Bernat, Alex; Suchy, Brian; Deiana, Enrico Armenio; Campanoni, Simone; Acar, Umut A.; Dinda, Peter; Hardavellas, Nikos (February 2023, Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization)

High-level parallel languages (HLPLs) make it easier to write correct parallel programs. Disciplined memory usage in these languages enables new optimizations for hardware bottlenecks, such as cache coherence. In this work, we show how to reduce the costs of cache coherence by integrating the hardware coherence protocol directly with the programming language; no programmer effort or static analysis is required. We identify a new low-level memory property, WARD (WAW Apathy and RAW Dependence-freedom), by construction in HLPL programs. We design a new coherence protocol, WARDen, to selectively disable coherence using WARD. We evaluate WARDen with a widely-used HLPL benchmark suite on both current and future x64 machine structures. WARDen both accelerates the benchmarks (by an average of 1.46x) and reduces energy (by 23%) by eliminating unnecessary data movement and coherency messages.
more » « less
Full Text Available
Quartz: superoptimization of Quantum circuit

https://doi.org/10.1145/3519939.3523433

Xu, Mingkuan; Li, Zikun; Padon, Oded; Lin, Sina; Pointing, Jessica; Hirth, Auguste; Ma, Henry; Palsberg, Jens; Aiken, Alex; Acar, Umut A.; et al (June 2022, PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation)

Full Text Available
Efficient Parallel Self-Adjusting Computation

https://doi.org/10.1145/3409964.3461799

Anderson, Daniel; Blelloch, Guy E.; Baweja, Anubhav; Acar, Umut A. (July 2021, Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

Full Text Available

« Prev Next »

Search for: All records